Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(e2e): support multiple aggregators in the e2e tests #2378

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

jpraynaud
Copy link
Member

@jpraynaud jpraynaud commented Mar 18, 2025

Content

This PR includes the adaptation of the e2e tests to support multiple aggregators:

  • The relay has been updated to support both Passthrough (messages are sent to the configured aggregator endpoint) and P2P (messages are sent to the P2P network) modes for both the signer registration and signature registration. The configuration options have been updated in that sense
  • The end to end test configuration has evolved:
    • number_of_aggregators and number_of_signers are specified instead of number_of_pool_nodes
    • use_p2pmode has been replaced by more appropriate use_relays
    • relay_signer_registration_mode and relay-signature_registration_mode have been added (used with the use_relays option)
  • RunOnly mode of the e2e test has been adapted to support concurrently multiple aggregators
  • Spec mode of the e2e test has been adapted to support concurrently multiple aggregators
  • Slave registration of the aggregator has been fixed as it was not targeting the correct verification keys. The associated integration test has been rewritten for finer testing of evolving Mithril stake distribution for each epoch.

Pre-submit checklist

  • Branch
    • Tests are provided (if possible)
    • Crates versions are updated (if relevant)
    • CHANGELOG file is updated (if relevant)
    • Commit sequence broadly makes sense
    • Key commits have useful messages
  • PR
    • All check jobs of the CI have succeeded
    • Self-reviewed the diff
    • Useful pull request description
    • Reviewer requested
  • Documentation
    • No new TODOs introduced

Issue(s)

Closes #2361

@jpraynaud jpraynaud self-assigned this Mar 18, 2025
Copy link

github-actions bot commented Mar 18, 2025

Test Results

    3 files  ±0     57 suites  ±0   11m 45s ⏱️ +16s
1 776 tests +2  1 776 ✅ +2  0 💤 ±0  0 ❌ ±0 
2 174 runs  +2  2 174 ✅ +2  0 💤 ±0  0 ❌ ±0 

Results for commit b48abad. ± Comparison against base commit 12cd9c8.

♻️ This comment has been updated with latest results.

@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from 99077bb to 9ad40fc Compare March 18, 2025 16:50
@jpraynaud jpraynaud temporarily deployed to testing-preview March 18, 2025 17:01 — with GitHub Actions Inactive
@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 5 times, most recently from cf5d195 to 7c6a300 Compare March 19, 2025 17:47
@jpraynaud jpraynaud temporarily deployed to testing-preview March 19, 2025 17:58 — with GitHub Actions Inactive
@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 5 times, most recently from de78895 to 42b4d01 Compare March 20, 2025 18:34
@jpraynaud jpraynaud marked this pull request as ready for review March 20, 2025 18:35
Copy link
Collaborator

@sfauvel sfauvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Just some remarks

@@ -940,7 +940,7 @@ mod tests {
runner
.expect_inform_new_epoch()
.with(predicate::eq(new_time_point_clone.clone().epoch))
.once()
.times(2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to change number of calls ?
Modification on the state_machine seems to concern only the slave mode.
Does it mean we are running a slave ?
Test name say that it is a master: "idle_new_epoch_detected_and_master_has_transitioned_to_epoch"

))),
}
}
SignerRelayMode::Passthrough => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test in this file about this code.
Are tests useless here?
Is it tested elsewhere ?

Copy link
Collaborator

@dlachaume dlachaume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// This should be removed when the aggregator is able to synchronize its certificate chain from another aggregator
if !aggregator.is_first() {
tokio::time::sleep(std::time::Duration::from_millis(
5 * aggregator.mithril_run_interval() as u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extracting the hardcoded value 5 into a named constant would improve readability and maintability?

Copy link
Collaborator

@Alenar Alenar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few caveats.

Comment on lines 272 to +279
self.runner.update_epoch_settings().await?;
if self.config.is_slave {
self.runner
.synchronize_slave_aggregator_signer_registration()
.await?;
// Needed to recompute epoch data for the next signing round on the slave
self.runner.inform_new_epoch(new_time_point.epoch).await?;
}
Copy link
Collaborator

@Alenar Alenar Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this change help to stabilize the e2e tests ? I'm quite puzzled over the fact that we need to call runner.inform_new_epoch twice.

From what I understand this doesn't impact the methods called between the inform_new_epoch calls:

  • runner.upkeep call should not be impacted
  • open_signer_registration_round do nothing on slave
  • update_epoch_settings should not be impacted as the data registered by the epoch service (protocol parameters and transactions signing config) don't depends on the master aggregator

The functional impacts should be:

  • epoch service will expose an incorrect list of next_signers in the interval between the two inform_new_epoch calls
  • epoch service will be ready earlier since a first inform_epoch calls will be done without needing a roundtrip to the master aggregator

Is the last point the problem on fast network ? Maybe the synchronizer should be able to "edit" the next signers in the epoch_service instead ?

@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 2 times, most recently from b464797 to 9e47514 Compare March 21, 2025 17:55
@Alenar Alenar force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from 9e47514 to be638e5 Compare March 24, 2025 12:19
jpraynaud and others added 25 commits March 24, 2025 18:17
By providing information about the targeted aggregator in logs and errors.
Which could prevent signature from signers even with loose protocol parameters.
Which can be 'Passthrough' or 'P2P'.
As master/slave signer registration is only one of the configurations to be tested.
Until we can fix the source of flakiness.
- Removed last epoch which was not necessary
- Removed unnecessary cycles
- Reduced the number of signers per epoch
- Use of 'checked_sub' in the 'EpochFixturesMapBuilder'.
@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from be638e5 to 3794e05 Compare March 24, 2025 17:18
@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from 3794e05 to b48abad Compare March 24, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2e tests adaptation for multiple aggregators
4 participants